Sequence Alignment in DNA Using Smith Waterman and Needleman Algorithms

نویسنده

  • M. P Sudha
چکیده

Algorithm and scoring parameters Eg ”best” Two methods for searching protein and DNA Evolution of protein and DNA sequence is done using database. 1. Local comparison i) Ignoring difference-outside most similar region ii) Find similarity between two sequence 2. Gobal Comparison. More appropriate when homology has been established when Building evolutionary trees comparison methods are preferred for functionally Conserved non homologous domains. Avoiding high similarity scores with unrealed sequences is more important as calculating related sequences while searching protein sequences databases. Thus comparison algorithm scoring matrix And Gap penalty are not most effective. INTRODUCTION Cells are fundamental working units of every living system and All the instructions which direct contained in the DNA (deoxyribonucleic acid). DNA consists of chemical and physical components. It is a side-by side arrangement (e.g., ATTCCGGA).genome is organism’s complete set of DNA. Which vary in size: smallest genome consists of 600,000 DNA base pairs, human and mouse genomes consists 3 billion .DNA in the human genome arranged into 24 distinct. Chromosomes—physically separate molecules range from about 50 million to 250 million. major chromosomal abnormalities, including missing or extra copies or gross breaks and rejoining (translocations), detected by microscopic examination. Each chromosome contains many genes, consists of 2% human genome rest of noncoding regions.Human genome contain 30,000 genes. Which perform major functions of cellular structures. Proteins are large, complex molecules of subunits called amino acids. Chemical properties that distinguish the 20 different amino acids cause the protein chains to fold into three-dimensional structures that define functions in the cell. constellation of proteins in a cell called proteome. The dynamic proteome changes from minute to minute . Protein’s chemistry and behavior are specified by gene sequence ,number and identities. Studies to explore protein structure and activities, known as proteomics, This focus research on molecular basis of health and disease. What is Bio-Informatics? Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.  Study of biological information.  Interface of biology and computers.  Computational molecular biology.  Includes genomic Sub fields: DNA informatics, protein informatics, proteomics. Comparison of sequences The most fundamental operation in protein informatics is finding the best alignment between a query sequence and one or more additional sequences Once candidate homologs have been identified, they can be evaluated using statistical methods and structural and biological information. The correspondence between two aligned sequences can be expressed in a similarity score and/or viewed graphically, e.g., dot plots, alignments, motifs or patterns. SCORING SYSTEMS PAM matrices Using many sets of 2 aligned sequences, for each amino acid pair Ai, Aj, count the # of times Ai aligns with Aj and divide that number by the total # of amino acid pairs in all of the alignments, resulting in the frequency, f(i,j) • Let fi and fj, respectively, denote the frequencies at which Ai and Aj appear in the sets of sequences • Then the (i,j) entry for the ideal PAM matrix is log f( i, j) f( i) f( j) BLOSUM (BLOcks SUbstitution Matrices) • Many sequences from aligned families are used to generate the matrices • Sequences identical at >X% are eliminated to avoid bias from proteins over-represented in the database • Specific matrices refer to these clustering cut-offs, i.e., BLOSUM62 reflects observed substitutions between segments <62% identical • In analogy to PAM matrices, a log-odds matrix is calculated from the frequencies A ij of observing residue i in one cluster aligned against residue j in another cluster Properties of Sequence Alignment DNAShould use evolution sensitive measure of similarity Should allow for alignment on exons => searching for local alignment as opposed to global alignment M.P Sudha et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (4) , 2014, 5957-5960

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiple Sequence Alignment Using MATLAB

Sequence alignment is an important task in bioinformatics which involves typical database search where data is in the form of DNA, RNA or protein sequence. For alignment various methods have been devised starting from pairwise alignment to multiple sequence alignment (MSA). To perform multiple sequence alignment various methods exists like progressive, iterative and concepts of dynamic programm...

متن کامل

Handling Rearrangements in DNA Sequence Alignment

Sequence alignment is one of the core problems of bioinformatics, with a broad range of applications such as genome assembly, gene identification, and phylogenetic analysis [1]. Alignments between DNA sequences are used to infer evolutionary or functional relationships between genes. Evolution occurs through DNA mutations, which include small-scale edits and larger-scale rearrangement events. T...

متن کامل

Fast Sequence Alignment Method Using CUDA-enabled GPU

Sequence alignment is a task that calculates the degree of similarity between two sequences. Given a query sequence, finding a database sequence which is most similar to the query by sequence alignment is the first step in bioinformatics research. The first sequence alignment algorithm was proposed by Needleman and Wunsch. They got the optimal global alignment by using dynamic programming metho...

متن کامل

An Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST

Introduction As databases of protein sequences and properties increase in size, it becomes more and more reliable to depend on previously classified proteins to determine the structure and function of a novel protein. One method of determining homology between two proteins is through a pair-wise sequence alignment of their primary structures. It has been found that two proteins that are homolog...

متن کامل

Designing and Building a Framework for DNA Sequence Alignment Using Grid Computing

Deoxyribonucleic acid (DNA) is a molecule that encodes unique genetic instructions used in the development and functioning of all known living organisms and many viruses. This Genetic information is encoded as a sequence of nucleotides (adenine, cytosine, guanine, and thymine) recorded using the letters A, C, G, and T.DNA querying or alignment of these sequences required dynamic programming too...

متن کامل

Acceleration of Biological Sequence Alignment using Recursive Variable Expansion

Biological sequence alignment is one of the most important problems in computational biology. Given two sequences of varying length are aligned such that the alignment score is maximum. The alignment score is calculated based on the number of matches, mismatches and gaps in the alignment suggested. The basic sequence alignment algorithms are Needleman-Wunsch (NW) algorithm and Smith-Waterman (S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014